Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

نویسندگان

  • Daan van Esch
  • Mason Chua
  • Kanishka Rao
چکیده

Word pronunciations, consisting of phoneme sequences and the associated syllabification and stress patterns, are vital for both speech recognition and text-to-speech (TTS) systems. For speech recognition phoneme sequences for words may be learned from audio data. We train recurrent neural network (RNN) based models to predict the syllabification and stress pattern for such pronunciations making them usable for TTS. We find these RNN models significantly outperform naive rulebased models for almost all languages we tested. Further, we find additional improvements to the stress prediction model by using the spelling as features in addition to the phoneme sequence. Finally, we train a single RNN model to predict the phoneme sequence, syllabification and stress for a given word. For several languages, this single RNN outperforms similar models trained specifically for either phoneme sequence or stress prediction. We report an exhaustive comparison of these approaches for twenty languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust stability of stochastic fuzzy impulsive recurrent neural networks with\ time-varying delays

In this paper, global robust stability of stochastic impulsive recurrent neural networks with time-varyingdelays which are represented by the Takagi-Sugeno (T-S) fuzzy models is considered. A novel Linear Matrix Inequality (LMI)-based stability criterion is obtained by using Lyapunov functional theory to guarantee the asymptotic stability of uncertain fuzzy stochastic impulsive recurrent neural...

متن کامل

Learning Similarity Functions for Pronunciation Variations

A significant source of errors in Automatic Speech Recognition (ASR) systems is due to pronunciation variations which occur in spontaneous and conversational speech. Usually ASR systems use a finite lexicon that provides one or more pronunciations for each word. In this paper, we focus on learning a similarity function between two pronunciations. The pronunciations can be the canonical and the ...

متن کامل

Learning Similarity Function for Pronunciation Variations

A significant source of errors in Automatic Speech Recognition (ASR) systems is due to pronunciation variations which occur in spontaneous and conversational speech. Usually ASR systems use a finite lexicon that provides one or more pronunciations for each word. In this paper, we focus on learning a similarity function between two pronunciations. The pronunciation can be the canonical and the s...

متن کامل

Solving Linear Semi-Infinite Programming Problems Using Recurrent Neural Networks

‎Linear semi-infinite programming problem is an important class of optimization problems which deals with infinite constraints‎. ‎In this paper‎, ‎to solve this problem‎, ‎we combine a discretization method and a neural network method‎. ‎By a simple discretization of the infinite constraints,we convert the linear semi-infinite programming problem into linear programming problem‎. ‎Then‎, ‎we use...

متن کامل

Artificial neural networks: applications in predicting pancreatitis survival

Artificial neural networks are intelligent systems that have successfully been used for prediction in different medical fields. In this study, the efficiency of a neural network for predicting the survival of patients with acute pancreatitis is compared with days-of-survival obtained from patients. A three- layer back-propagation neural network was developed for this purpose. Clinical data (e.g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016